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Abstract 

Given an undirected graph G = {V, E) on n vertices, m edges, and an integer i > 1, a sub- 
graph (y, Es), Es C £' is called a i-spanner if for any pair of vertices u,v ^V, the distance 
between them in the subgraph is at most t times the actual distance. We present streaming algo- 
rithms for computing a i-spanner of essentially optimal size-stretch trade offs for any undirected 
graph. 

Our first algorithm is for the classical streaming model and works for unweighted graphs 
only. The algorithm performs a single pass on the stream of edges and requires 0{rn) time 
to process the entire stream of edges. This drastically improves the previous best single pass 
streaming algorithm for computing a t-spanner which requires d{mnt) time to process the 
stream and computes spanner with size slightly larger than the optimal. 

Our second algorithm is for StreamSort model introduced by Aggarwal et al. [2], which 
is the streaming model augmented with a sorting primitive. The StreamSort model has been 
shown to be a more powerful and still very realistic model than the streaming model for massive 
data sets applications. Our algorithm, which works of weighted graphs as well, performs 0{t) 
passes using O(logn) bits of working memory only. 

Our both the algorithms require elementary data structures. 
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1 Introduction 



A spanner is a (sparse) subgraph of a given graph that preserves approximate distance between 
each pair of vertices. Putting in more formal words, a t-spanner of a graph G = {V, E), for any 
t € M is a subgraph {V,Es),Es Q E such that, for any pair of vertices, their distance in the 
subgraph is at most t times their distance in the original graph. The parameter t is called the stretch 
factor associated with the t-spanner. The concept of spanners was defined formally by Peleg and 
Schaffer [25] though the associated notion was used imphcitly by Awerbuch [5] in the context 
of network synchronizers. Since then, spanner has found numerous applications in the area of 
distributed systems, communication networks and all pairs approximate shortest paths [5, 9, 26, 27]. 

Each application of spanners requires, for a specified t G IN, a t-spanner of smallest possible 
size (the number of edges). Based on the famous girth conjecture by Erdos [17], BoUobas [11], and 
Bondy and Simonovits [12], it follows that for any /c G IN, there are graphs on n vertices whose 
{2k — l)-spanner or a 2fc-spanner will require 0(n^+^/'^) edges. The conjecture has been proved for 
k = 1, 2, 3 and 5. Note that the conjectured worst case lower bound is the same for stretch 2k and 
2k — 1, and by definition, a {2k — l)-spanner is also a 2A;-spanner. Therefore, from the perspective 
of an algorithmist, the aim would be to design an efficient algorithm to compute a {2k — l)-spanner 
whose size is 

0(j^i+i/fe) for 

any given graph. 

For unweighted graphs, Halperin and Zwick [21] designed a deterministic 0{m) time algorithm 
to compute a {2k — l)-spanner size. However, for weighted graphs, it took a series of 

improvements [4, 6, 14, 29, 8, 7] till an expected 0{m) time algorithm for computing a {2k — 1)- 
spanner could be designed. This linear time randomized algorithm [8, 7] computes a {2k — 1)- 
spanner of size 0{kv}'^^/^) for a given weighted graph. Recently Roditty et al. [28] derandomized 
this algorithm. 

In this paper, we consider the problem of computing a {2k — l)-spanner in streaming model 
and its recently extended variant StreamSort. These models capture the complexities of algorithms 
designed for massive data set applications more accurately, and are thus gaining ever increasing 
attention these days. Our algorithms for computing spanners are significantly superior to the previ- 
ously existing ones, and are arguably optimal. We shall now briefly describe the streaming model, 
the StreamSort model, and the motivation for computing spanners in streaming environment. Then 
we present the (bounds of) previously existing streaming algorithms for spanners, and new results. 

1.1 Streaming model 

The streaming model [22] has the following two characteristics : firstly the input data can be ac- 
cessed sequentially (in the form of a stream), secondly the working memory is considerably smaller 
than the the size of the entire input stream. So an algorithm in this model can only make a few 
passes over the input stream to solve the corresponding problem. The sequentiality in accessing the 
data and the small working memory size enforce the following restriction : during a pass, a data 
item once evicted from the memory can't be brought back into the working memory. It is due to this 
restriction that the streaming model is more stringent than other models namely, various external 
memory models [1, 30], and models for competitive analysis of algorithms [23]. 

The features and restrictions of the streaming model have been motivated by various technolog- 
ical factors pertaining to massive data set applications. Due to enormity of size along with various 
practical and economical reasons, the input data of a massive data set application resides on sec- 
ondary and tertiary storage devices. These devices are optimized for sequential access and impose 
substantial penalties (seek times, cache misses, pipeline stalls) for non-sequential data access. So 
an efficient algorithm in this model should make a small number of sequential passes over the in- 
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put data with a small size of working memory. The number of passes and the size of working 
memory are the two parameters associated with a streaming algorithm. An additional parameter is 
the processing time per data item. These three parameters also capture the efficiency criteria for a 
streaming algorithm. 

This model is gaining a lot of attention currently due to emerging massive data set applications. 
Earlier, in this model, much attention was given to problem related to computing order statistics, 
outUers, histograms [3, 13, 20, 22]. Recently, much attention has been given to solving graph 
problems in this model, for example, approximate distances, spanners, and matching [18, 19, 24]. 
A typical graph problem in the streaming model involves making one or more sequential passes 
over the stream of edges. 

Aggarwal et al. [2] introduce an extension of streaming model called StreamSort model which 
is more powerful and still very practical than the streaming model. An algorithm in this model 
performs two kinds of passes - stream pass and sort pass. The stream pass sequentially reads the 
input stream, processes it with its limited memory, and produces an output stream. During the 
pass, the output stream is written left to right, and a data item once written can't be erased. A 
sort pass sorts a stream according to some well defined order and produces as an output a sorted 
stream. An output stream of one pass can be used as input stream for the next {stream or sort) 
pass. An algorithm in StreamSort model thus performs a few stream pass and a few sort passes to 
solve a computational problem. In a slightly simpler variant of StreamSort model, Demetrescu et al. 
[15] presented streaming algorithms for undirected coimectivity and shortest paths problem which 
achieve near optimal trading off between space and the number of passes. 

1.2 Computing a spanner in streaming environment and new results 

Being one of the fundamental problem in its own right, computing spanners in a streaming envi- 
ronment is a significant problem. This problem has recently gained more relevance due to all-pairs 
approximate shortest path problem in streaming enviroimient. Due to enormity of size, it is just not 
feasible to compute or store all-pairs distances in streaming environment for graphs appearing in 
massive data sets applications. So one wants to settle for approximate shortest paths to save space. 
A result of Thorup and Zwick [29] showed that for any data structure capable of answering {2k — 1)- 
approximate distance query would need ^{v}'^^/^) space. The result obviously holds for streaming 
environment too. If we can compute (2/c — l)-spanner efficiently in streaming environment, it can 
be employed to solve the APASP problem in the streaming environment in the following way : For 
any pair of vertices, just explore the spanner to report the approximate distance. Feigenbaum et al. 
[19] took this approach for all-pairs approximate distances in streaming environment. 

We would like to add a note that a fc-pass streaming algorithm for a {2k — l)-spaimer of 
size for any weighted graph is implicit in the algorithm of [8, 7], and the process- 
ing time for each edge is also just 0(1) during each pass. The working memory required has size 
0{kv}'^^/^). Since each pass is a time consuming task, it is always desirable to have a singe pass 
algorithm for computing a {2k — l)-spaimer. For such an algorithm, one would also aim to keep 
processing time per edge bounded by a constant. Feigenbaum et al. [19] made a step in this di- 
rection. As a main result in their paper, they present a single pass streaming algorithm (Theorem 
2.1, [19]) for computing a t-spanner for any unweighted graph. Though they don't mention it, their 
algorithm is indeed an adaptation of the algorithm of [8, 7] for streaming environment. However, 
the bounds their algorithm achieves are suboptimal : For any A; G M, their algorithm computes a 
{2k + l)-spanner of expected size 0{kn^+^/^) and requires expected 0{k'^v}/^) processing time 
per edge. Note that the size of the spanner thus computed is away from the optimal by a factor of 

e{n^'^^). 
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In this paper, we succeed in achieving optimal bounds and size-stretch trade offs for computing 
a {2k — l)-spanner in streaming environment. We achieve the following two results. 

1. Given any unweighted undirected graph, and G IN, a (2fc— l)-spanner of expected 0{kn^^^^^) 
size can be computed in classical streaming model with single pass and 0{m) processing time 
for the entire stream (amortized constant processing time per edge). 

Remark. The algorithm at each stage maintains a {2k — 1) -spanner of the graph seen so far. 
Therefore, it can also be viewed as a partial dynamic (incremental) algorithm for computing 

a {2k — l)-spanner of an unweighted graph with amortized 0(1) time per edge insertion (the 
same observation, but with inferior bounds, holds for the earlier algorithm of [19]). 

If the edges appear sorted in nondecreasing order of their weights in the stream, our algorithm, 
without any modification at all, would work for weighted graphs as well. As a result, it 
requires one sort pass followed by a stream pass in the StreamSort model for computing a 
{2k — l)-spanner of expected 0{kn^~^^^^) size for any /c G IN and any weighted graph. Note 
that working memory has size of the order of spanner size, and though larger than n, is indeed 
optimal for classical streaming model. 

2. Given a weighted undirected graph, and G IN, a {2k — l)-spanner of expected 0(A;n^+^/*^) 
size can be computed in StreamSort model in 0{k) passes total and with O(logn) bits of 
working memory only. Furthermore, each Stream pass in this algorithm spends just 0(1) 
time per edge. 

We would also like to mention that the algorithms presented in our paper employ elementary data 
structures (link lists and arrays). The algorithms (and their analysis) presented in this paper are 
complete on their own. 

Remark. Elkin and Zhang [16] address the problem of computing (1 -|- e, /?)-spaimer in stream- 
ing environment. Their algorithm, though sheds some light on the APASP problem in streaming 
environment, has little practical relevance. This is because, the number of passes required, though 
constant, depend quite heavily on e, /3. 

2 Preliminaries 

We assume, like the previous algorithms [18, 19], that n, the number of vertices is known in advance 
and the vertices are numbered from 1 to n. 

As mentioned in the introduction, our algorithm is basically a careful adaptation of the previous 
static linear time algorithms [8, 7, 10] in the streaming environment. The central idea of these 
algorithms is clustering which we define below. 

Definition 2.1 A cluster is a subset of vertices, and a clustering C, is a union of disjoint clusters. 
Each cluster will have a unique vertex which will be called its center. 

The uniqueness of the center of a cluster can be used to represent a clustering C as an array (of the 
same label C) of size n in the following way : C{v) will denote the center of the cluster containing 
V unless when v does not belong to any cluster, in which case C{v) = 0. We shall say that a cluster 
c is incident on or adjacent to a vertex u if there is some vertex u G c adjacent to u. With respect to 
a given clustering C, a vertex u is said to be a clustered vertex if it belongs to some cluster in 
C, and an unclustered vertex otherwise. 
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The role of clustering to achieve a small size spanner can be described intuitively as follows. 
Suppose we can partition the vertices into a small number of disjoint clusters, and span each of 
these clusters by a small set £ Q E. As a consequence of this clustering, each vertex u & V 
has all its neighbors grouped in various clusters. Among those edges that are incident on u from 
same cluster, say c, selecting just one edge will ensure the following property. For each missing 
edge («, v) such that f G c, there is a path connecting u and v using one of the selected edges and 
some edges from £, and the length of this path is at most one unit more than the diameter of the 
cluster containing v. (In order to ensure a small bound on the stretch, we need these clusters to have 
very small diameter). This simple idea of pruning edges lies at the core of the static algorithm of 
[8, 7, 10], and to materiahze it they build a multilevel clustering using random sampling. 



3 Algorithm for {2k — 1) -spanners in classical streaming model 

Prior to processing the stream of edges, the algorithm constructs an initial {k + l)-levels of cluster- 
ings {Ci|0 < i < /s} for the empty (without edges) graph as follows. 



Initializing the (/, + I )-lcvcls of clusterings 



Let 5o ^ y, 5fc = 
For < i < A;, 

Si contains each element of set Si independently with prob. n~^/^ 
For Q <i < k 

Ci ^ {{v]\v € S,] 



We introduce two notations at this point. 

(.{v) : the highest level of the clustering in which v is present as a clustered vertex. 

ls{v) '■ the highest level i < k such that the cluster centered at w is a sampled cluster in Ci. 

Note that, in the beginning ^s{v) = £{v) for all the vertices. However, as the edges are being 

processed, the level £{v) of a vertex might rise. 

We shall now give an overview and intuition of the algorithm. Initially, at each level i < k, every 
cluster is a singleton set. From viewpoint of clustering, the only change in a cluster during the 
algorithm will be that other vertices (from levels lower than the cluster) might join it. We shall 
always use the following convention : a cluster c G Cj is a sampled cluster if in the beginning of the 
algorithm, the corresponding singleton cluster was a sampled cluster. The following assertion will 
hold throughout. 



For each c G Cj+i, there exists a sampled cluster c' G such that c' C c. 



Now we describe the way the stream of edges is processed by the algorithm, and how the 
clustering evolves by upward movement of vertices. Each vertex u e V waits at its present level 
£{u) for an opportunity to move to a level higher than £{u), and the only opportunity for it to move 
higher is when it receives an edge incident from some sampled cluster in . We shall explain 
soon how this tendency of vertices to rise to higher level proves crucial to compute a sparse (2/c — 1)- 
spaimer. It follows from assertion A that a sampled cluster c E Ci has some c' G Cj+i such that 
c C t/. Whenever u gets such an edge, it hooks itself to the sampled cluster c to join (become 
member of) cluster c' (so c' gets updated accordingly). In case, c appears as sampled cluster at the 
next level also, the vertex u will join the next level parent as well. As follows from the sampling 
involved in building the hierarchy of clusterings, only a very few of the clusters at any level are the 
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sampled clusters. So a vertex will get an opportunity to become adjacent to a sampled cluster on 
very few occasion, and until then, it adds edges to the spanner in a frugal manner using the smart 
idea of clustering, as follows. Let the vertex v be member of only unsampled cluster at level i{u). 
Let c be the cluster at level £{v) in which v is present. In this case, the vertex u just adds an edge 
(u, ij) to the spanner if c was not adjacent to u earlier. Vertex u would keep a list storing one edge 
from each cluster of Q that is adjacent to it. Now, in order to determine whether the cluster c was 
previously incident on u before the edge {u, v), it suffices to explore the entire list of edges incident 
from various clusters at level £{v), which could be quite large. (Feigenbaum et al. [19] used this 
brute force search). In order to achieve amortized 0(1) time, we adopt a buffering approach in 
which we keep a buffer storing the edges at each level temporarily. The vertex v will initially add 
the edge {u, v) to its temporary buffer at level £{v), and prune this set once there are sufficiently 
large number of edges using the procedure Prune(u, i). 

A vertex's tendency to move to higher levels proves crucial to compute a sparse (2A; — l)-spanner 
in the following way. At lower level, there are a large number of clusters, so we can't afford to add 
edges from a vertex to all these clusters. As more and more number of clusters at level i > £{u) get 
adjacent to u, one of them might be a sampled cluster. Since a sampled cluster is present at higher 
level too (see assertion A), getting hooked to a sampled cluster would pay u in the sense that it 
moves to a higher level where there are fewer clusters. At level k — 1, there would be expected n^^'' 
clusters, and once u reaches this level, it can afford to add a single edge to each of its neighboring 
clusters. 

Having given an intuitive and informal description of the algorithm above, now we shall present 
the algorithm and the associated data structures formally. 

Data structure : We shall use k arrays Ci,i < k to store clustering at each level. As mentioned 
earUer Ci (u) will store the center of the cluster in Cj storing u. In case u is not clustered at level i, 
Ci{u) will store 0. Each vertex u € V keeps lists Temp{u) and £{u). The list £{u) will store edges 
incident on u from unsampled clusters at level l{u), and Temp{u) will act as a buffer for these 
edges which we shall purge once the number of edges in Temp{u) exceeds the number of edges in 
£{u). 

Processing an edge {u,v) from the stream 



1 . Assigning the edge to the endpoint at lower level 

If £{u) > £{v), then swap {u, v). 
i <— i{u) , X <— Ci{v) , h <— is{x) , 

2. Processing the edge 
Uh> i 

2.1 For j = z + lto/t, do 

Cj (u) X , 

2.2 i{u) ^ h 

2.3 £s ^ Temp{u) U £{u) U {{u, v)} 

2.4 Temp{u) ^ , £"(«)<- 
Else 

2.5 Temp{u) ^ Temp[u) U {(n, 

2.6 If|remp('u)| > \£{u)\,i\\Qn Prune{u,i). 



The If condition in step 2 checks whether there is any sampled cluster containing v at level l{u) or 
higher, and if so, the vertex u joins a cluster. Otherwise, the clustering remains unchanged. It is 
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easy to observe that the assertion A will hold after every edge is processed. 

The procedure Pruneiu, i) : The procedure uses a boolean array ^[l..n] as a scratch space. The 
array A is initialized to 1. First it scans the Ust £{u) and sets to 1 entries in A corresponding to 

clusters in Ci neighboring to u. It then scans the edges in the list Temp{u), and eliminates an edge 
if the corresponding cluster was already incident, otherwise it adds it to £{u). Afterwords, we scan 
the updated list £{u) once to undo the changes made in array A so that A is initialized back to its 
start stage (all entries set to 0). 

The procedure Prune(u, i) 



1. For each edge (u, w) G £{u), do 

A[Ci{w)] ^ 1. 

2. For each edge (u, v) G Temp{u), do 

i{A[Ci{v)]=0 and Ci{u) Ci{v), 

2.1 A[Ci{v)] <- 1. 

2.2 £{u) ^ £{u)U{{u,v)}. 
Temp{u) <— Temp{u)\{u,v). 



Observation 3.1 For each vertex u & V, \Temp{u)\ < \£{u)\ except before the invocation of 
Pruneiu) when \Temp{u) \ exceeds \£{u) \ by one. 

3.1 Analyzing the running time 

It takes 0(1) time for processing an edge except when it invokes Prune{). Let us analyze the total 
time spent in a single call of Prune{u, i). It follows from the description of the procedure that the 
total time required by Prune{u) is of the order of | + \Tempi{u)\, which by Observation 3.1 
is 0{\Tempi{u)\). So it suffices to charge 0(1) cost to each edge of Tempi{u) to account for the 
time spent in a call of Prune{u, i). Note that an edge is processed only once by Prune{u, i) while 
being a member of Tempi{v). This is because, after Prune{u,i) procedure, either the edge gets 
discarded forever or it becomes a member of £i{u). Hence it suffices to charge 0(1) cost to each 
edge in order to account for the total computational cost charged to all calls of Prune{) during the 
algorithm. Hence total time spent in required for processing the stream of edges is 0{m). 



Let £'^ be the set Ui<fe,„gy(rempi('u) U £i{u)) \J£ ai any stage of the algorithm. 

In the following section, we shall prove that : the set £'^ at any given moment is a {2k — l)-spanner 
for the set of edges appeared in the stream till that moment, and its expected size 0{kn^+yk). This 
way, the algorithm can also be viewed as an incremental algorithm for computing a (2A;— l)-spanner. 

4 The stretch and the size of the spanner computed by the algorithm 

4.1 Analysis of the stretch of the spanner 

First we state an important Lemma. 
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Lemma 4.1 Let c be any cluster in Ci. Each vertex v E cis connected to its center through at most 
i edges from S. 

Proof: The proof is based on induction on i and the number of edges of the stream seen so far. Let 
X be the center of the cluster c. If c is a singleton cluster, there is nothing to prove, so assuming 
otherwise, let it ^ x be a vertex which belongs to c. Now observe the process by which u joined 
the cluster c. The vertex u became member of c only in the situation where an edge {u, v) appeared 
in the stream with vertex v being a member of some sampled cluster c' in Ci-i. The assertion A 
implies that, d is a subset of c and so has x as its center. Now applying inductive assertion, there is 
a path C £ between v and x with length i — 1. This path concatenated with the edge («, v) (also in 
£), is a path C £ between u and x of length at most i. 

□ 

The streaming algorithm processes each edge of the stream and discards a dispensable edge only 
through the procedure PruneQ. In order to prove that is a {2k — l)-spanner, we shall show that 
for each edge (u, v) discarded by the algorithm, there is a path in of length at most {2k — 1) 
that connects u and v. Without loss of generality, assume that the edge (u, v) got discarded during 
Prune{u, i). Now the edge {u, v) could be discarded only if we had already selected some other 
edge {u, w) in £i{u) incident from the same cluster in Ci to which v belongs. Lemma 4.1 implies 
that the center of each cluster in Ci is connected to its members through a path in £ with length at 
most i. Hence v and w, being the members of the same cluster, are connected by a path in £ with 
length at most 2i. This path concatenated with the edge {u, w) G £{u), is a path in between u 
and V with length at most 2i+l, which is at most 2k — 1 since i < k always. Hence we can conclude 
that f + at any moment is a {2k — l)-spanner for the the set of edges appeared in the stream till that 
moment. 

4.2 Analyzing the size of the spanner 

In the algorithm, a vertex u contributes edges to £ only when its level £{u) increases. So \£\ < 
n{k — 1). Let us count the expected number of edges in £i{u) and Tempi{u). It follows from 
Observation 3. 1 that the number of edges in Tempi {u) is at most (u) | + 1. So it suffices to bound 
the number of edges in £i{u). 

First we would like to make an observation. When an edge {u, v) appears in the stream with 
£{u) < £{v) and let vertex v does not belong to a sampled cluster at any level from £{u) onwards. 
This edges makes u adjacent to the cluster containing v at level i{v). Note from the algorithm that 
although the vertex v is clustered from every level £{u) to i{v), it is only the cluster at level i{v) 
which gets adjacent to u by edge {u,v). So the sets {£i{u)} are disjoint always. It also follows 
from the procedure Prune{) that £i{u) stores one edge per cluster at level i that gets adjacent to u. 

We shall give a bound on the expected size of For any arbitrary but fixed stream of 

edges, let (ci, C2, . . .) be the clusters at level i arranged in the chronological order of their getting 
incident on to u. When a cluster from Ci gets adjacent to u and the cluster is a sampled cluster, the 
vertex u will hook onto that cluster and move to the next level. It follows from the algorithm that 
from this time onwards, u won't add any edge to Tempi{u) or £i{u). So an edge incident from 
Cj will be selected in £i{u) if none of ci, Cj_i were a sampled cluster. From the sampling of 
clusters done in the beginning of the algorithm, it follows that each cluster at level i is a sampled 
cluster independently with probability p = n"^/*^. So an edge incident from cj on u will e added to 
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£i{u) with probability (1 — n ^/'^y . Hence the expected number of edges in £i{u) is 

Since there are n vertices, it follows that the expected size of the spanner computed by the streaming 
algorithm will be 0{kn^^^^''). Note that it could be that vertex u moves to level higher than i even 
when it gets adjacent to some sampled cluster at some level > i. But that would only decrease the 
number of edges contributed as analyzed above. 

Theorem 4.1 Given any G M, a {2k — l)-spanner of expected size 0{mhi(m,kTn}^^/^)) for 
an unweighted graph can be computed in streaming model in one pass with amortized constant 
processing time per edge. The working memory required is 0{kv}'^^/^). 

Now we shall show that the algorithm for classical streaming model described above will work 
for weighted graphs as well if the edges appear in the increasing order of edge weights. 
We shall employ the following observation which follows from the procedure Prune{). 

Observation 4.1 Consider any vertex u, c G Ci,i < k and the period during which i{u) < i. 
Among all the edges in the stream that get incident on ufrom c in this period, the edge that appears 
first in the stream is surely present in the spanner 

Proof: Let (n, v),v € c be the first edge incident on u from c during the period i{u) < i. It will be 
added to Tempi{u) initially hke any other edge. When Prune{u, i) is invoked in near future, and 
the edge {u, v) is processed, it is clear that ^[Cj(f )] = since by definition there was no edge prior 
to (u, v) which is incident on u from c. Hence {u, v) gets added to £i{u) and subsequently to the 
spanner. □ 

Along similar lines, we can infer the following observation. 

Observation 4.2 Consider any cluster c € Cj, and let v be a vertex present in c. For the period 
£{v) = i, let be the edges that gets incident on v from vertices lying at level < i. All the edges 
lying on the path from v to the center ofc appeared before any edge in the set E^. 

Let the edges in the stream appear in the non decreasing order of their weights. Let our single 
pass algorithm (designed for unweighted graph) processes this stream ignoring the edge weights. 
We shall show that the spanner computed will also be a {2k — l)-spanner of the original graph 
with weighted edges. Let {u, w) be an edge discarded by the algorithm, and let us suppose it got 
discarded during Prune{u, i), for some i < k. Let G c G Cj, it follows from Observation 4.1 
that there is some edge, say {u,v),v G c that appeared before {u, w) in the stream and got added 
to the spanner. From the arguments used in the proof of Lemma 4.1, it follows that v and w were 
connected by a path of at most 2i edges from set £. All these edges and the edge {u, v) form a path 
in the spanner of length at most 2i + 1. Using Observation 4.1 and 4.2, it also follows that all these 
edges appeared before the edges {u, w) in the stream. Hence each of them is at most as heavy as 
(n, w) since the edges appeared in the stream in the nondecreasing order of their weights. So there 
is a path between u and w in the spanner consisting of at most 2i + 1 edges each one being at most 
as heavy as {u, w). Hence the spanner is indeed a {2k — l)-spanner. 

Thus we can conclude that our single pass streaming algorithm originally designed for un- 
weighted graphs will also compute a {2k — l)-spanner for weighted graph provided the edges ap- 
pear in nondecreasing order of their weights. So an algorithm for computing a {2k — l)-spanner in 
StreamSort model would be as follows. 
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1. First run a sort pass on the input stream / which will produce an output stream O where edges 
appear in the nondecreasing order of their weights. 

2. Execute our single pass algorithm of earlier section (originally designed for unweighted 
graphs) on the stream O ignoring the weights. 

Theorem 4.2 Given any A; G M, a {2k — l)-spanner of expected size 0(min(m, kn^+^l^)) for 
weighted graph can be computed in StreamSort model with one sort pass followed by one stream 
pass and it requires amortized constant processing time per edge during the stream pass and the 
working memory required is 0(k'n}~^^^^). 

In the following section we shall describe an algorithm for computing {2k — l)-spanner in 
StreamSort model which will require O(logn) bits of working memory and perform 0{k) passes 
only. 

5 Algorithm for {2k — 1) -spanners in StreamSort model 

We shall now present an algorithm for computing a {2k — l)-spanner in StreamSort model. The 
algorithm works for weighted graphs as well and will require just 0(log n) bits of working memory 
and 0{k,) alternating passes of Streaming and Sorting. 

The algorithm can be viewed as a streaming version of the static RAM algorithm for computing 
{2k — l)-spanner given by [8]. We provide a brief overview of the algorithm below. The algorithm 
executes k iterations. Each iteration begins with a partially built spanner Es, a subset of edges E' 
for which decision of including them into spanner has yet to be made, a subset V <Z V such that 
end point of each edge in E' is present in V' . In addition, ith iteration begins with a clustering Cj-i 
which partitions V into disjoint clusters such that each edge in E' is an inter-cluster edge. The 
clustering Cj^i has the following crucial property. 

P : For each edge (n, v) G E' , there is a path from u to the center of its cluster in Ci-i with i — 1 
edges each of weight not more than that of {u, v). The first iteration begins with V' = V,E' = 
E,Es = $,Co = {{v}\veV}. 

Execution of ith iteration selects each cluster from Cj-i independently with probability n^^^'^. This 
sampling forms the basis of defining the clustering for ith iteration. Namely, Ci consists of the 
clusters sampled in ith iteration with every vertex not belonging to any sampled cluster joining 
its nearest neighboring sampled cluster (if any). In addition to it, processing of each vertex in V' 
contributes some edges to spanner and discards a few in the ith iteration. We shall describe the exact 
description of the ith iteration and its execution in StreamSort model soon. But before that, we need 
to proprocess the initial stream of edges, and introduce a few key ideas which lead to execution of 
ith iteration in StreamSort model in 0(1) passes. 

5.1 Augmenting the initial edge stream, and two sorting primitives 

Our algorithm will receive just a stream of edges. In order to execute our algorithm, we will as- 
sociate some more fields with each edge and vertex. We do so as a preprocessing phase of the 
algorithm. Preprocessing of initial edge stream : We preprocess the initial stream of edges to 
produce another stream such that for an edge between u,v G F in the stream, we introduce two 
edges denoted as (n, v) and {v, u) in the output stream. We shall use {u, v) to denote the edge 
associated with vertex u and we shall use {v, u) to denote the edge associated with vertex v. 
In addition, we augment the data structure of each edge {u, v) with the following additional fields. 
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• Icenter and reenter storing the center of cluster to which u and v belong in present clustering. 
Since the initial clustering is {{a;}|a; € V}, C{u) <— u and C{v) <— v. 

• spanner-edge : which is set to 1 if (n, v) is selected as spanner, and set to -1 if it has not to be 
added to spanner, and to if no such decision has been made. So initially, this field is set to 
for each edge. 

• sampled-edge : which is set to 1 if either of u or v belong to a sampled cluster during an 
iteration. 

For each vertex u G V, we store the following additional variables. 

• C{u) : the center of the cluster in present clustering containing u. Initially C{u) <— u. 

• sampled{u) : a boolean variable which is true during an iteration if u belongs to sampled 
cluster. 

• J\f{u) : the weight of the edge incident on u from nearest neighboring sampled cluster. 

Main idea is to show that for processing various steps of an iteration, we need to sort the edges 
and vertices in a suitable total order such that each task of ith iteration can be executed by performing 
a few Sort passes and a few Stream passes. We shall first introduce two total orders on the set of 
edges. 

1. ^0 

An edge (x, y) precede {p, q) in if 

min(a;, y) < min(p, q) or min(a:;, y) = min(p, q) and max(a;, y) < max(p, q). 

Given two clustering C,C' on a set of vertices V', we define an order ^(c,C') on the set of 
vertices V and edges E' as follows. 

• a vertex u would precede vertex v in the total order ^(c,C') if 
C{u) < C{v) or C{u) = C{v) and C'(n) < C'{v) 

We break the tie, that is, C{u) = C{v) and C'{u) = C'{v) by comparing the labels u and 

V. 

• an edge (n, v) would precede another edge {x, y) in the order ^(c,c') if 
C{u) < C{x) or C{u) = C{x) and C'{v) < C'{y) 

We break the tie, that is, C{u) = C{x) and C'{v) = C'{y), by resorting to lexicographic 
comparison of (u, v) and {x, y). 

• a vertex u precede an edge {x, y) in the order di{c,C') if < C{x). 

Lemma 5.1 Suppose we want to arrange all the edges so that if there is an edge between two 
vertices u and v, then its two occurrences {u, v) and {v, u) occur together This goal can be achieved 
by a sorting according to the order 

We now state the following Lemma which would highlight the importance of arranging edges 
according to the order di{c,e')- 
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Lemma 5.2 If the list of edges E' is arranged according to the order -<{cfi')> then for any two 

clusters c & C,c' € C, 

(i) the set of edges {{u, v)\u E c}, i.e. the edges emanating from the cluster c appear as a sub-list, 
say Lc. 

(ii) the set of edges E'{c, d) appear as a sub-list within the sub-list Lf.. 

Corollary 5.1 If either C of C is the clustering {{u}\u € V}, then in the total order ^(c.c), all 
edges incident on a vertex u appear together as a sub-list and immediately succeed the vertex u. 

5.2 Algorithm for {2k — l)-spanner in StreamSort model 
Algorithm : 

As mentioned earlier, the algorithm will execute k — 1 iterations. The ith iteration will begin with 
a tuple {V',E',Es,Ci-i), where Es is a partially built spanner, E' c E consists of those edges 
for which decision of selecting into spanner (or discarding) has not been made yet. Moreover, each 
endpoint of an edge in E' is present in V' and the clustering Ci-i partitions V' into disjoint cluster 
such that each edge in is an inter cluster edge and the property Pi-i is satisfied : 

Our algorithm does not do any processing on the edges of Es and basically processes only E' 
and V' in the stream. The various fields of the data structures associated with E' and V' store the 
following information in the beginning of ith iteration - the fields Icenter and reenter of each edge 
{u, v) € E' store Ci-i{u) and Ci-i{v) respectively. The sampled-edge field of each edge is reset, 
and sampled field of each vertex is also reset. Af{v) of each vertex stores oo. 

We now present the four basic tasks of the ith iteration for computing a {2k — l)-spanner and 
their execution in StreamSort model as follows. 

1. Forming a sample of clusters : 

Sample each cluster from Cj_i independently with probability n~^/^. However, ifi = k — 1, 
then sample no cluster 

Execution in StreamSort model : Perform a sorting pass on the stream of vertices V' and 
edges E' according to the order ^(Cj_i,Co)- Consequently, the vertices (and their edges) be- 
longing to same cluster in Ci_i appear together in the stream. We make a Stream pass on this 
stream and do the following. We pick each cluster independently with probability and 
set the field sampled of the vertices of the sampled clusters accordingly, and also set the field 
sampled-edge of each edge emanating from them. 

2. Finding nearest neighboring sampled clusters for vertices : 

For each vertex not belonging to any sampled cluster, if it is adjacent to one or more sampled 
cluster, compute the least weighted edge incident from the nearest sampled cluster; let M(v) 
stores the weight of the edge. 

Execution in StreamSort model : We sort the stream according to so that for an edge 
between u and v, the two occurrences {u, v) and (f , u) appear together. We make a Stream 
pass and if sampled-edge{u, v) is set to 1, then we set sampled-edge{v, u) to 1 as well. After 
this, we sort the stream according to the order di{Co,Co)- As a result, we can observe the fol- 
lowing. All edges incident on a vertex v appear contiguously in the stream. We process each 
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vertex v €i V in this stream as follows. If v is not sampled, then we select the least weighted 
sampled-edge incident on it. If {v, x) is such an edge then we set C{v) ^ rcenter{v, x) (so 
V gets assigned to the cluster containing x in Cj), set spanner-edge (v, x) to 1 and let ^^{v) 
store weight of the edge {v, x). However, in case, v is not adjacent to any marked edge, we 
set J\f{v) to oo. 

3. Adding edges to the spanner : 

Each vertex v not belonging to any sampled cluster does the following : For each cluster 
c € Ci-i, incident on v in the clustering with edge of weight less than that of J\f{v), we select 
the least weight edge from E'{v, c) and mark it as a spaimer edge. 

Execution in StreamSort model : We perform a Sort pass on the stream according to the 
order ^(Co.Ci-i) that all the edges incident on a vertex from same cluster in the clustering 
Ci-i appear contiguous and a vertex precedes immediately all the edges incident on it. We 
process a vertex v in the stream as follows. For each cluster incident on v with edge of weight 
less than J\f{v), we mark least weight edge incident on v from that cluster as spaimer-edge 
and mark others as non-spaimer edge. 

We make a Sort pass over the stream of edges so that both the occurrences of an edge are 
together and then delete both of them if any of them has spanner-edge field set to - I. 

4. Defining the clustering Ci : 

Keep only those vertices which belong to sampled cluster or were adjacent to sampled cluster. 

Execution in StreamSort : We make a Stream pass and delete all those vertices v for 
which sampled{v) = and J\f{v) = oo. If a vertex u belonged to a sampled cluster, then 
it continues to belong to same cluster. If it did not belong then unless it is deleted, it was 
adjacent to some sampled cluster and calC{u) was set to the center of new cluster in the 
second step. This defines a clustering Ci for all the vertices among V which survived ith 
iteration. We need to set the Icenter and reenter of each edge now according to the new 
clustering Ci. We do so as follows. We make a Sort pass on the edges E' and vertices V 
according to the order ^(Co,Co)- Consequently all edges incident on a vertex v will appear 
together. We assign lcenter{v, w) of each edge to C{v) and reset rcenter{v, w). We make a 
Sort pass according to so that both the occurrences of an edge appear together. We then 
perform a Stream pass and for each pair of edges (u, v) and (?;, u) that appear consecutive 
now, we set rcenter{u, v) <— lcenter{v, u) and rcenter{v, u) <— lcenter{u, v). 

It is obvious that each step of ith iteration is executed in StreamSort model using a constant number 
of Stream passes and Sort passes. Since the algorithm is a streaming version on the static RAM algo- 
rithm, its correctness follows from the correctness of the latter. However, for sake of completeness, 
we shall now provide an overview of the correctness of the algorithm. 

A simple inductive argument can be given to show that Vi holds at the end of ith iteration. And 
on this basis, it follows that for any edge (n, v) that we delete from E' , there is a path in the spanner 
Es with at most {2i — l)-edges joining u and v. So at the end of the algorithm, the set Eg will 
indeed be a (2A; — l)-spanner. Also note that the number of clusters incident on a vertex with weight 
less than nearest neighboring sampled cluster in Ci-i is a geometric random variable with mean 
n^/^. Hence expected number of spanner edges contributed by a vertex in an iteration is 0{n^^^). 
Since there are — 1 iterations, the expected size of the spanner computed by the algorithm is 
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Theorem 5.1 Given any A; G M, a {2k — \)-spanner of expected size 0{mm{m,kn^^^^'')) for 
weighted graph can be computed in StreamSort model with O(logn) bits of working memory and 
0{k) sort passes and stream passes. Furthermore, it requires constant processing time per edge 
during each stream pass. 

6 Conclusion and open problems 

We presented single pass algorithm for computing a {2k — l)-spanner of expected 0{kn^^^^'') size 
with 0{m) processing time for the entire stream (amortized constant processing time per edge). 
We also showed that in the StreamSort model, the algorithm can be extended for weighted graph as 
well and would require one sort pass followed by a stream pass. However, the working memory in 
both these algorithm is of the order of size of spanner, which though optimal for classical streaming 
model, is very large. We then provide an algorithm for computing spaimer in StreamSort model 
with O(logn) working memory and 0{k) passes. It can be seen that these two algorithm achieve 
optimal or near optimal performance in all aspects - number of passes, amortized processing time 
per edge, working memory size in both models. One aspect which is not truly optimal is the expected 
size of the spaimer which is away from the conjectured lower bound by a factor of k at most. An 
important open question is : Can we get rid of multiplicative factor k from the the size 0{km}'^^/^) 
of {2k — l)-spanner computed in streaming model? Note that this factor is present in case of the 
static randomized algorithm as well. So either a more careful and involved analysis of randomized 
algorithm would be required or some fundamentally new approach should be pursued to answer this 
question. 
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